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ABSTRACT ' ' 

Focusing on the assessment needs of language minority 
ents in the early elemen1\ary years and on the Valuation of 
programs servicing them, this discussion directs specific attention 
toward accommodating language minority students in the New Follow 
Through Program. Introductory remarks offer recommendations for 
developing New Follow Through models for culturally and 
linguistically integratecJ settings and for developing tests for 
English-proficient and limited-English-proficiency children. The 
first, major section describes the state of the art. iii^ assessing 
language minority students. Several ways tests are £iui3used are 
pointed out and language proficiency assessment, testing school ^ 
achievement, and teacher assessments are discussed. The second major 
section examines variables thou^ht^o be important in describing 
programs for language minority stuoents^and in studying the 
relationships of such programs to various student characteristics and 
local conditions. Model, program, classroom, and student variables 
are .specif ied and discussed in terms of problems associated with 
instrumentation and measurement and with respect, to measuring 
variables of interest-. The final section identifies problem areas 
associated with the evaluation of bilingual programs. It is concluded 
that the inclusion of language minority students in the New Follow 
Through Program poses challenges and opportunities for curricular", 
psychometric, and evaluative innovation. (RH) 
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Introduction and General Coir^ments 

i 

This paper focusos on the assessment needs of language mino: t;- students 
in the early elementary years and on the evaluation of program:: ;.;r!cb ^lervice 
thefn. If, as proposed, language minority students are to play s'onif^cnnt 
role in the New Follow. Through Program (New FT), then the importor.c*: r/f ad- 
dressing their measurement loons large indeed, for without metKo;;-. i ■::;g''c.?i 
refinement in instrument design and selection most of the ar:h^ev^?i^/cjnt :\i)u 
affective data collected will probably be v/orthless, much will cndouht^:^^: iy 
b'i suspect, and only a little will clearly merit the cost of anvuys'i> 

Today the importance of properly assessing bilingual stud.""*t4 il- 
recoctnized because of their growing ^VJmbers, geographical disj;».rrsv.**- the 
inf luence of numerous federal and state prograins specifically deshx-"^^''^- to 
me?t t:i'^ir needs, the imoact of federal courtjcases, anS^tbe voice of ih^ 
bilingual c-ducationaT constituency.^ It is becoming profitable to produce 
tes^:s for this market, and a number of tests have been developed to measure 
a variety of language and achievement constructs for bilingual or potentially 
bilingual children. Unfortunately, the quality of most of these instruments, 
particularly instruments which measure aspects of Janquage proficiency, 
leaves much^ to be desire'd (Bernal, 1977). ' 

Psychometrics as a field is reluctantly becoming aware of the challenges 
to test validity posed by children for whom tests in English alone have so 
far inadequately assesse'd their aptitudes, attitudes, achievement^ and devel- 
opment. Obviously, many bilingual children can be tested appropriately with 
"extant English instruments. The problem is that it is difficult to tell, who 
the? ;- are ahe.-^ i of time without conducts nq other assessments. Mhe point is 
t language minority populations ?^ .u uMy-l inguistically dl^^'v 



and this means that they are- behaviorally different, often even in the 
realm of test-taking behaviors. f 

Similarly, nTany popular evaluation schemes, such as those v;hich require 
pre-posttesting .v/ith all -English achievement batteries, may be thproughly 
confounded by apparent gains made by certain students in* social studies or 
science when what has really happened is that they have learned to read. 
At the lower ranges, students who remain limited in English proficiency 
(LEP). experience a cumulative deficit, since norms wait for no one. Should 
the scores of these two groups be averaged together... presto.* No gains. 
Evaluation designs need to be especially sensitive to 'crucfiat intervening ^ 
variables and,' as will be shown later, to "^-ecial conditions whic>i affect 
studies in naturalistic settings. ■ ^ ^ . . 

How bilingual children may be accommodated in the Nev/ FT Is also an 
important question. ' In the "old" Follow Through study, brlingual children 
were so much extraneous "noise" for most of the mbdel.s and dicf not figure 
prominently in the analyses. Under the New FT, they will be included from 
the outset, but whether language^inority children are relegate , to certain 
New FT models or are accommodated by all models has impolrtant v cations 
for evaluation. " , )4 ' 

One way to accommodate language minority student^ is- for all of the 
New FT models to make provision for spme type of parallel bilingual instruc- 
tion. While this might sound preposterous to some, consider that models 
must be adaptable to a great number of school s'ites-, that LEP children are 
found in virtually all major schocJl systems and many small ones, that 
several states have mandatory '"bil ingual" education for LEP' students , and 
that desegregation efforts could impact a program by'-introducing LEP chil- 
dren into a previously English moaolirM i " e. -ing. 
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Another v;ay to proceed would be to designate tv/o— and preferably nore— 
models as bilingual models under the New FT.- This would be a step in keep- 
ing with now traditional thought in compensatory education. Yet even here 
some of thf concerns Just discussed may have' bearing. Would such models be 
used exclusively with language minority. children? If so, how might they 
acconimodate desegregation orders? Bilingual education has not always, fared 
well in desegregated settings 1[see^ Zirkel , 1977). If non-LEP, non-language 
minoV^ty children can participate with LEP and even with bilingually profi- 
cient children, v/ould the model provide a dilute3 bilingual treatment, one 
which would implicitly give- preference to En^glish speakers and seek to 
"transition," or "reel assi fy , or "exit" the language minority child at the . 
earliest possible moment? This might happen,, for instance, if the bilingual 
model were really nothing more than a set of Engl ish-ffsf-a-second language^ 
activities appended to an English nodel. , ' 

Thfs writer feel s 4hat any exclusively defined bilingual model v/ould be 
limited in its applicability to ethnically stable, relatively homogenojlis 
sites characterized by a large majority of language minority sfUdents, Many 
educators would, ,of course, find such -a mode.l useful, assuming that it would 
be effective. Furthermore, 'the measurement and evaluation issues alluded to 
earlier could be isolated in the New FT impact- study. But such a modeJl would 
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not answer the needs of post children or most educators at most sites, would 
not be in keeping with the original thrust of Follow Through, which Hodges 
(1978) characterized 'as a genial innovation "for tying theory and research 
to actu^al educational p>a<$tice" (p. 190). " It would not, in short, educe our 
best efforts to. innovate, to anticipate the future, and to make the best use 
of current knowledge in curriculum development,' testing, . and .evaluation. 
Perhaps" the most innovative thing which the New FT could do. for 



•compensatory education would be to produce interventions which do not^appear 
to be compensatory, do not egregiously press children to learln and become 
"just like normal Americans. \ • Perhaps we should start thinking about pro- 
ducing £uaVit/ prggrar^s (Gonzalez^ 1979), complex curricula 
which provide different options for chi ldren---strands , if you would. Per- 
haps we need to devise tests which approach the challenge of measurement 
not from the perspective of convenience for the test maker or the test user 
but with a view to being able validly to assess many populations--including 
the dominant Anglo cultural gtjoup—simultaneously or. irj equivalent ways. 

And so I come to the e'nd of these introductory remarks with a 
recommendation: The National Institute'of Education (NIE) should prepare 
RFPs for two feasibility studies: (1) developing New FT models for cul-- 
turally and linguistically integrated settings, .(2) devising or adapting 
tests in the cognitive, achievement* and affective domains of interest to 
the New FT -for English proficient and LEP children. Alternatively, NIE 
c6iild prepare non-binding RFPs on either or both of these topics to see if 
feasible, defensible proposals turn up. 

We need to know what*our options are, realistically. If we do not 
consider deliberately the. relationship of language minority students to the 
entire New FT effort, their presence by design or accident may become a 
nuisance, a "noise" or Cacophony which our interventions, instruments, and 
methodology are ill prepared to orches'trate. A 



'Assessing Language Minority Students: T^e Stdte of the Art 
Briefly. > > 

Hispanics and other language minority qVoups have become victims of • 
'-test abuse and test misuse because (1) they have not been adequate.ly repre- 
sented in the sanples of students used for test develcprrient (Green, 1972) i 
■(2) their language characteristics and lack of test sophistication have not 
been taken into account in research and evaluation designs or in individuf^l 
test interpretation and. educational decision making, (3) test results hive teo' 
. often been of little practical value, and (4) staff knowledge of test. scores 
has produced ^ self-fulfilling prophecj/ effect in school settings (De Avila 
& Havassy, 1974). For exampleTwhereasxIQ and related tests have served to 
misdiagnose disproportionately /large numbers of Hispanic children into 
mentally retarded or language-^nd learning disa^rility categories (Gerry, 

1973) , these instruments have not been especially helpful in identifying 
■ chil\iren at the other end of the ability spectrum, the gifted (Bernal, 

1974) . . - 

Although a few testing companies have in recent years been ruaking 
progress in meeting some of these testing problems and developing more valid' 

iiority n^oups, psychologis: ; the , .eld of measurement and 
test developers' have generally not dealt with these issues and have not 
sought to iiiipact those aspects of test Tffisuse which are under their control 
or influence (Bernal, 1975).^ Instead, those that have articulated on the 
issues have ^ither shifted the bldine Lie practitioner (e.g., Cleary, et 
al, 1975) or, arguing tha^ tests hao/e sufficient validity for some purposes 
(often predictive validity), have been satisfied to indicate that test 
scores merely describe the parameters of the problem, but do not create Tt 



o .7 
ERIC 



# 

(e.g., Jdcobson, 1977). - - , * 

Stni, legal and social pressures and a hauntina, if vague'dissatisfac- 
tion-v/ith a seemingly endless litany of apologies has caused test developers 
and psychonietrists to take steps to rectify abuses and misuses in the field. 
Unfortunately, the measures undertake)^ have frequently been the source of 
new problems v;hile not really ameliorating the basic condition. 

Malpractices, In Passing 

The first malpractice, most often found in- field settings, consists of 
"adding points" to obtain scores 'of lanjuage minority students. This'proce- 
dure is, of course, basically a v/ay of making low test scores more palpable, 
since it does nothing to increase a test's validity. Sometimes the number of 
points to bemadded is subjectively but experiential ly determined; in other in- 
stances the number is based on the average difference betv/een Anglo and 
minority 5cores--a very questionable oractice indeed, especially when anplfed 
to individuals. The method is wrong but 4:he motive for adding points is that 
educators working v;jth minority children sometimes find that many them have 
ac/iie.ved more than tnp test scores indicate. Doubtlessly, one of the reasons 
why vai^ n^^^'"^^] and stat edL„_.Jona7 organizations have not been friendly 
to the use of certain types of tests, especially with minority populations, 
is that too many teachers do not believe thtir results (e.g., NEA, ca, 1900/. 

A second mal::jractice involves limple renorming, i.e., the computation 
of ethnic norir.£, c . ^en locally. Renorming accomplishes what adding points 
does, but the numbers are determined empirically. The only real advantage 
of renorming is that it provides good descriptive statistics for a particular 
ethnic population and a better -distribution of scores. But renorminc appears 
to the uninitiated to do more, to somehow make the test better. It doc3 not. 
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Test translation v/ithout tryout and subsequent rrodif ication and valida- 
ticn has also beccn:e a popular practice, v/hether done by a testing corripany 
or locally by a practitioner. Sonietimes only the directions are translated, 
but often the entire test is recast into Another language, usually- Spanish. 
I have witnessed individually administered tests presented inJboth 'languages 
a procedure v/hich involves the repetition of each item and v/hich produces 
an unsyjtenatic practice effect on scares^ depending on a child's bilingual 
skills. 

Some testing companies* brochures illustrate English and tralislated 
versions of a test in a way v;hich suggests that they are^parallel jforms, 
when in fact no empirical verification or equating nrocedure has been 
attempted, not even back translation, a technique which has proven so use- 
ful 'in equating the mean-ings of statements in cross cultural research . 
(Monaster & Havighur^t, 1972). In fact, some translated, multiple choice 
tests are so ''paral leV .that even the position of the correct answer 
unchenged--a Tr,p-: nrer nt travesty when one con?>''ders that both versions are 
sor adMiinistered to the same students in quick succession— again an 

untoward practice effect. .Furthermore, oSome translated ^?sts have no norms 
for the noo-Engl ish Janguage version; test users are left to assume that 
the English norms are applicable. 

The psychometric and practical problems with test translation are many. 

Obviously some types of tests, such as simple psychomotor or discrimination 
^ i 

tasks or straightforv/ard computation problems, can -Gually'be presented in 
another language with little adaptation, particularly so when no reading is 
required of the exanrinee. Even here, however, cultural content should be 
checked and test direction-s back translated, whenever appropriate, and sub- 
mitted to a trial phase. Vocabulary tests or:*p>oblem solvinq tasks involvin 



cultural content or internal verbal rreciiatlor. cannot be slmly translated wit 
out risking the alteration' of item characteristics or the factor structure of 
the tests. In other v/ords, translation usually changes the difficulty range 
of an item (e.g., yf spangle is translated" to lentejuela , the item changes 
in difficulty for Hispanic students). Translation may also change the 
options Q student rr.ay othenrise have in answering an iten (e.g., stami) inay 
be a verb or a noun in English, but timbre , estamoil 1 a , or sellar in Spanish 
limit the usage of the v/ord). Items recast into another language may be 
more or less useful in differentiating m.ore accomnlished students from their 
below average peers. Finally, a test v/hich measures one factor for Anglos 
(e.g., practical intelligence: "What should ^ou do if you cut your finger?") 
might be measuring another fa:tor for Hispanics (e.g., degree of accultura- 
tion to Anglo values and practices), especially if scoring criteria have a 
limited range of acceptable responses. 

Most often translated tests use a relatively formal standard dialec* 
to produce expeditiously a test which will appeal to as wide a group of 
potential customers as possible. The result, tragically, is that some lan- 
guage minority students who speak a dialect of the language and who have not 
had sufficient bilingual education; score low on tests in both languages. 
In still other cases (fortunately few) all language minority children enter- 
ing school for the first time are tested exclusively irf the non-English 
language, thereby penalizing' those who-^re n^ost proficient in English, a 
special case of test misuse v;hiclj once ag4injpl aces .language minority stu- 
dents in a disadvantaged situation. 

a- 

Another malpractice is thh administration of selected subscales of 
larger diagnostic and intelligence tests to language minority students. If 
this practice v/ere based on empirical findings of greater reliability or 
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validity for certain subtests, there would be little reascn to object; 
however, this practice usually rests on the belief that LEP students score 
higher (i-e-, look better) on sor:e subscales than on others. Perfom^ice 
siibscales, for example, are often preferrea fay practitioners over verbal 
s-::ales, in spite of the fact that basing general interpretations on per- 
formance tests has usually yielded disappointing results, both for t:he 
Anglo population {Nunnally, 1959) and for different, cultural and national 
groups as v/ell (Anastasi , 3 976) . As a rule, then, the decision to adminis^ 
ter only certain subtests to language minority students should be based on 
empirical studies which incorporate relevant linguistic and ethnographic 
variables in their designs. 

The last malpractice to be discussed is the profligate use of so-called 
out-of-leve1 testing with LEP children. The argument goes among some 
Dsychometrists and evaluators that since LEP children modally score so low 
on English-based achievement tests, some technique is necessary to generate 
more variance and normalize the distributions. Oift-of-leveV testing does 
this— ostensibly— but makes interpretation difficult even with the applica-^ 
tion of expanded standard scores. ^Such testing, in my opinion, is rarely 
used to enhance individual diagnosis*. Instead, these data are summarized, 
.and the ^resultant reports often becloud the problem— however lamely— with 
passincy^ferences to the normative standard or the "introduction of grade 
equivalent explanations. Out-o?-level testing, in short,, becomes a statis- 
tical legerdemain for "adding points"." ' ^ . 

All of these malpractices have, come about because of one simple fact, 
often intuited but rarely admitted and important to the New FT's planning: 
there are precious few reliable, valid tests to use with LEP students. The 
prescription is also simplet or at least straightforv/ard: develop tests— 
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fron scratch , "vhe re necessary--v;hich adeauately reasure constructs of 
Interest in tnese populations. 

<» 

Language Proficiency Assessmen t v • 

^ The practice of testing norf-English native Tankage skiVis has become ' 
so tied to bilingual ecjcation, so intertwined with entry-classification and 
exit-reclassifi cation practices, which involve the additional assessment of 
€nglish language skills, that it is difficult to speak about the -assessment 
of proficiency in one language alone and without thinking its use in educa- 
tional decision making. The problems and issues which besiege the testing 
of non-Eng\ish language skills--and the opportunitiies to improve the 
associated measurement strateaies and instruments— are closely paralleled 
in the English domain, and since v;e will be mainly discussing -the testing 
of children who* are actually or nascently bilingual, it is easy to meld 
one's concerns and thoughts ^on these subjects. 

It seems also that the testing of non-English native 1 angupige skills 
is in most bilingual program settings not done as often or as extensively 
(in terms of different aspects of language) as the assessment of English 
language skills. In this'^ author 's opinion .this occurs primarily because 
extant federal and state program eligibility requirements emphasize English 
- over native language skills. Then, too, the fact that most bilingual pro- 
grams are transitional in nature probably augers for greater testing in 
English, instruction in which is closer to traditional educational ideology 
{Banks, 1979). 

So although research on bilingualism in' the schools indicates thf^ 
importance of-measuring both languages on- a regular basis, the testing of 
Spanish language skills continues to languish in both quality and frequency 
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(Bernal 1977) • The time has come for intelligent, multidisciplinary . 
cooperation among bilingual educators, linguists, and psychometricians to 
design a*nd produce a variety of valid instruments (Bernal., Note i) which 
simultaneously address the pedagogical and cl ass ificatory needs of" bilin- 
gual programs and evaluators' and researchers* needs for versa ile and 
accurate measures of educationally consequential skills. 

Historically the concepts of limited English speaking ability (LESA) 

If ~ ' 

and more recently limited English profi^cienc y (lEP) v/ere established more 
for compliance accountability than for curricular planning, LESA can be 
measured by tests which measure a student's aural comprehension and speak- 
ing proficiency in English; most extant tests'^of language proficiency were 
developed at the time when this construct was in vogue. LEP, on the other 
hand, is more comprehensive, at least beyond the second grade or so, when 
skills Jn reading and writing ascend in importance. To this writer's knowl 
edge,* no single test or test battery for measuring! LEP beyond the second^ 
grade exists, and unless new Title VII regul ati ons ^'or Lau gui del i ng^'^'-speci-fy 
or operationalHze this construct, individual programs must determine what 
this means for themselves (Bernal, Note 2). 

Many'bilingual programs, unfortunately , administer only the English 
parts of these language assessment. tests, reasoning— in this case correct- 
ly—that 



a LESA child is also LEP. In the later elementary grades, however, 
a non-Lc^A child may be LEP, as discussed above. Several ^good reviews of 
Spanish proficiency tests exist (e.g., Silverman, Noa, & Russell, 1976; 
Dieterich, Freeman,' ^'Crandall, Note 3), but they seem to converge weakly.^ 
on the conclusion that few good ones are to be had, and are more useful for 
indicating what to avoid than for v;hat to do. 

Still, knov;ledae of LEP status alone, without data of the child's 



ability in"Spanish, has limited usefulness for designing appropriate,- inter- 
vehtions. Given a LEP child entering school for^the first time, information 
about her/his Spanish competencies might lead us to suspect the validity of 
the testing administration, helD us decide to refer the child for further 
assessment, provide important, placement information, or screen children 
whose native or English language skills might bejvery "mature," i.e., much 
more developed than those of their typical agemates. ^ 

Cummins (1979) suggests that the continuous assessment of bilingual 
students' progress in the development of native language skills is important, 
particularly if one wants to predict their success in an all-English educa- . 
tional environment. Native language achievement is an indicator of students 
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general 'academic potential in English as well. Hi.s very re.cent theoretical 
work (Cummins, 1980) distinguishes between b.a^ic interpersonal communica- , 
tion skills (BICS) and cognitive/academic language proficiency (CALP), the 
metalinguistic skills which provide the kinds of learning advantages that 
some bilinguals seem to enjoy. It is the CALP in one's first, language (LI) 
which predicts success in the second language (<U) environment. It seems 
that extant language proficiency tests mostly measure BICS, and thus havfe 
-limited utility for diagnosis or, educational placement and classification. 

' Language proficiency utilizes. .criterion-referenced, norm-referenced,, 
or a combination of both techniques to establish the level .of an examinee's 
language mastery, and it can be measured through interview techniques or 
paper-and-pencil tests, depending on the aspects of language (productive or 
receptive skills) one wishes to define as appropriate to a particular age/ 
grade level or to a specified role/situation (such as the proficiency 
required for a bilingual teacher). Tests of language proficiency, unlike 
popular measures of vocabulary and reading, emphasize aspects of linguistic 



competence. 

Language dominance is a construct properly reserved for the nascent or 
functioning bilingual. It may be defined operationally as the higher of two 
language proficiency levels. There is a great demand for measures of- lan- 
guage dominance, particularly for Hispanics, from early childhood through 
the early elemerltary years'. Bilingual education and ESL programs variously 
use language dominance appraisals to accept children,, place then« in instruc- 
tional groupings, assess their langu'age progress, evaluate certain aspects, 
of curricula, and in the case of transitional bilingual education, to 
determine' the appropriate point at which a student is ready to exit the 
bilingual program and enter the English monolingual course of instruction' 
ordinarily offered in the schools. 

Language^domi nance assessments made without an examination of language 
proficiency have, in my opinion, fostered two related and tacitly held beliefs 
which desensiti'Ze educators to individual differences. One is that chil^dren 
cannot be proficient, in the language in v;hich they are not dominant; the 
other is that children must be competent, in their dominant language. Some* 
bilingual children— like some monol inguals— do have a language dysfunction,, 
and this affects their language competence even' in their domin^^language. 
Normal and, certainly, gifted childi4n acquire two language systems readily, 
although they may still be more proficjeqt iJi one of" them. . 

The quality of tests for measuring language proficiency varies 
considerably, but not one to date is truly outstanding, or even satisfac- ' 
tory. This author has served on an advisory committee on . prc^ficiency assess- 
ment to the Texas Education Agency.* The committee has reviewed dozens of 

♦Committee for the ^valuation of Language Assessment Instruments (CELAI). 
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tests and has. found them all wanting to a greater or lesser degree in terms 
of traditional psychometric criteria Or linauistic content and organization 
•(CELAI, Note 4). - . 

In their worst forms language proficiency tests' pander to scjieols' 
tight bilingual budgets and some districts' desires not to identify LEP 
ch.ildren,^v/hile others rest on highly questionable assumptions, purport to 
measure the impossible (e.g.^ purport to measure language dominance without 
measuring' language proficiency)*, or do not guide the users whatsoever iti the 
interpretation of the results. The^rest too often present linguistically 
unrealistic demand^ (e.g., "Now we're going to talk in English," or "Please 

use complete sentences") or ^arbitrary scoring or weighting procedures, and 

• ■- ■ . 

generally suffer from a lack of sensible items,- sufficient Janguage sampling 

'V 

and reliability. Then, too, not- even a handful have been validated against 

e 

... t ~ 

groups of proficient monolinguals, and none have befen examined in the light 
of the demand characteristics of bilingual- or English monolingual classrooms 
Tests that use scales 'or "ordered categories of proficiency are not sensitive 
to the student who is marginally proficient in either language alone but 
|ievertheless communicatively competent in informal bilingual settings where 
codeswitching is the rule. The fact that an increasing number of bilingual 
programs are including non-LEP (i.e., English' proficient, or EP) students 
in the program, for .many of whom Spanish is in effect a second language, 
also necessitates the testing of S?)anish language skills . ^ 

N ^ . 

Testing School Achievement 

' The testing of school achievement areas in a non-English language • 
poses other probl6Tns, not the least of v;hich is the lack of tests well 
suited for m^ny language minority groups. A few tests in Spanish are on 
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..the mark'et, but these are essentiaj^ly trans! ated--and in some cases renormed 
—versions of the Enjglish^achievement series published by the same' company. 
These translations have item-by-item similarity v/ith the original tests in 
English, which effectively precludes the use of both English and Spanish 
versions of the same level test on the same bilingual childv^en, since ex- 
posure to one v/ould produce a practice effect for the other. A notable 
exception is the CIRCO test battery (Bernal, Note 5); Hardy, Note 6), which 
was adapted (not merely translated) from the CIRCUS (in English) and which 
includes new tests developed for Hispanic students. This test, however, 
can be used only with four to six year olds. Most non-Hispanic, language 
minority groups simply have not had standardized achievement tests developed 
for them to date. . ^ .. , 

This and the fact that locally developed instruments do not have the 
credibility of commercial instruments are the 'principal reasons why achieve- 
ment testing is conducted by and large in English. Such testing of LEP 
children, however, produces considerable .personal and statistical fallout. 

In this writer's evaluation consultations with bilingual programs in - 
public school settings, he has seen instances' where grade level averages, 
involving dozens of bilingual classes in several schools, have just reached 
chance level performance on nationally standardized achievement tests. I 
cannot say, of course, just how widespread a phenomenon this is, -but if 
OBEM|.A's* plans to implement (on a voluntary basis) a standard Title VII 
data reporting form (Baca, Bernal , DeGeorge, & Mangino, Note 7) are carried 
out, then such data can be calculated. 

These tests can be frustrating to LEP children, and their results, as 
discussed earlier, often lack credibility with teachers. For evaluators, 
*Office-of B". lingual Education and Minority Languages Affairs, 
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too, t\\€ results can be frustrating, since a large percentage of scores at 
or below cHance makes for fairly arbitrary interpretations of the results, 
interpretations which historically have placed the burden on minority chil- 
dren, the inadequacy of their educational programs, and their economic 
circumstances, instead of calling for a reexamination of the tests them- 
selves {Bernal , 1975) . - 

Recent tnst reviews of English-based tests used in b,ilingual programs 
(e.g., Silverman, Noa, & Russell, 1976) have started to emphasize test , 
appropriateness as an important aspect of validity.^ The valid application,, 
of a test assumes that the examinees ^re not unlike the group{s.) upon which 
the test was developed and standardized. To the extent that important 
psycWblogical differences exist (such as in cultural background and language 
proficiency), test results must be interpreted with caution apd supplemental 
measures of the trait or construct in question should be utilized to cross- 
check the results. . - ' 

In achievement tes-ing, too, ? psychometric lag occurs, since we know 
how to obtain "better" |v --fornkince from .language minority students on 
standardized tests (Ber;^!,, 19:/7); performance wiiich increases their scores 
and enhances test rel -a;>^: i ■ v aiul horsfully, measurement validity. 
Specific recommendations v.riV; 'oh inc. i/ed h: ft later section of this paper. 



Teacher Assessments - - ■,' 

In addition to tr.-:; 'Assessment of students there is<a growing trend to 
assess teachers in the rvon-English (and sometimes in English) language 
skills- (Carlisle-Zepeda, & Saldate, 1978). This author endorses thisc trend 
(although he is aware of the political agendas which sometimes motivate it), 
because one reason- why so many bilingually certified teachers do not teach 



15 ' ' 

18 



in Spanish is that their basic Spanish skills are inadequate or that their 

* 

content-related vocabulary is. lacking. Effective teachers in bilingual 
classrooms are boih/professional ly and interpersonally articulate (Rodriguez, 
1980). ' . 1 ■ s ' ' . 

'One of the techniques being used to assess teachers' :ion-English language 
competency is the Language Proficiency Interview (LPI), a taped and blindly 
scored version of the- technique used by the Foreign Service Institute and 
the Peace Corps to test the conversational abilities of their trainees. The 
LPI is used by New Jersey, and Texas uses it in addition to a standardized, 
multiple-choice test of Spanish language competencies for prevfous'ly -derti- 
fi.ed teachq^s seeking to obtain a bilingual endorsement through additional 
course workl Obviously 'Cloze tests and other procedures* could be used. 
What is still needed is a better measure of teachers' non-English writing 
skills. Techniques for estimating the adequacy of writing .samples 'i 
English . have been developed, and these could be 'adopted for the assessment 
of non-English writing skills as well, * • • 

Summary ' ^ , 

, Bilingual assessment is, now recognized as crucially important fQr> the 
selection* of bilingual teachers, for the screening, placement, and reclas- 
sification of LEP students, and for designing appropriate bilingual educa- 
tional prpgrams. Extant instruments, unfortunately, are not entirely 
capable of satisfying these needs. 
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What Should Be Measured to Study Program EffeQts on Language Minor-'ty 
Students in the New Follow Through; An RRD Agenda 



In this section v/e wi ll, examine variables which are particularly 
important for describing programs for language minority students and study- 
ing the intoractions of these programs with various student characteristics 
and. local conditions, y'lt is assumed that many other variables will be 
considered in the ordinary p^^^ogression of events in evaluation ^igsign and 
implementation, so emphasis will be placed here on bilingual models, 
students, and sites. ^ • 



Model /Program/Classroom Variables 

The follo'-ing table (Table 1) presents in summary form the model/ 
program/ class room variables of high potential interest to the New FT. 

Tabic 1 

Model /Program/Classroom Variables for Studying the Impact 
of the New Follow Through on Language Minon^ity Students 

•Teacher 1 anguage^ proficiency in English: 

- Speaking proficiency: conversati^al , general educational, and 
curricular areas. 

- Writing skills, general. • ^ 

Teacher proficiency in the non-Engl ish language: 

- Speaking proficiency: conversational, general educational, and 
curricular areas. 

- Writing iskills, general (when applicable).* 



*Sdme non-Erf^glish languages found in biliri'^ual programs have no standard 
written form. , 
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Table 1 - continued • . " * 



InstructionaV^aide's language proficiency in English: 



Speaking 
Writing 



proficiency: conversational anc cumcular areas, 
skills, general (v/hen applicable)^^ 

Instructional ^^"de^ proficiency in the nbn-Enclish language: 



- St)eaking proficiency: .conversational and curricular areas, 
-'Writing skills, gener^+^when applicable).* ^ , 

Division of ijistructionaL duties in LI and L2^ ^y teacher^arid aide. . 

Instruction given in LI and 12: ^ . . ^ ^ 

- Total time in each. ■ „ • 

- Percent of instruction in each. ^ ^ 

- Content areas affected by each: Veadi^ng, math', social studies, etc. 

Fprmal and informal ^language interafetions between aide and teacher,*' teacher., 
or aide and students: , . 

- Use of LI and L2 by function (e.g., instruction,' encouragement, 
direction, dii- ^ ipl inu) . , ' 

Parental "participation :' r , ' ^ ' 

- In parent-teacher conferences. ... * 
~ In bilingual classroom activities* 

- In parent training (if applicable). .< 

- Use of LI and L2 in these activities. 

Instructional Management: - v 

- "Pull-out" vs. integral. 

■ - Timing of L2 introduction: delayed,^imultaneous, or immersion. 

Degree of ethnic/1 inauistic'intetiration: 

• / ■ . - ' f ^' 

- Ethnic/language minority grollps^. represented. x. 

- Language proficiency categories represented. , • . * 



' .While^'the instructor's proficiency in English and the non-English lan- 
guage has always been regarded by experts^as crucial for the success of 
bilingual education programs (for example. Center for Appl ied^Lingai sties , 
1974; California State Department of Education, Note 8), recent empirical 
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evidence (Rodriguez, i980) tends to shov; that higher levels of fluency and 
linguistic fl^exibility differentiate* the better bilingual educators from thei 
average and below average colleagues. Numerous approaches to the measurement 
of instructor proficiency ^have been tried (Bernal , Note 2), ranging frqm 
evidence of college credit in the non-En^lo"sh language to structured inter- 
views and standardized paper and pencil tests. 

Most of these techniques ^ however, ha^^e been "make-do," and none singly 
satisfies the' need to measure all receotive .^and productive language -skills. 
This writer's hunch is that the implementation ^of the non-English fac'^ts oi 
a bilingual program depend in no small way not^ohly upon teachers* and aides^ 
conventional Skills, but also upon their abili'ty to converse professionally 
in the langu^e, on their knowledge of content-related phrases, and on their 
general writing ability. The same may be said for their skills in English. ^ 

'Bilingual educators, in other words, probably need to be competent to under- 

« 

*stand, speak, read, write, and teach. the languages involved in a particular 
' program model before they will actually use them to any great extent in the 
classroom or for commUnica-^ing with language minori1:y pTarents. 

In order to measure these skills adequately, some techniques need to' 
be applied to the task. The New FT should call for the adaptation of the 
Language Proficiency Interview (LPI) (ETS, Note 9) to include discussions 
about professional topics. and instructional areas. The LPI technique re- 
cords thase interviews and has them scored blindly by trained raters who 
use. a criterion referenced scale with five or six majcir and four or five 
minor ranks. If correctly adapted, such a scheme could yield separate 
scores for general and pedagogical arj^s. • 

Although writing skills in the non-English language could be measured 
by multiple-choice tests, this writer urges that more demanding tests be ^ • 

■' - '■' ' . 
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developed on already exist'ing techniques. ^ Dictation and cloze test techni- 
ques (Oiler, 1975; Oiler & Streiff, 1975) are particularly good, inexpensive 
techniques, to measure the grammatical components of v/riting skills. Cloze 
tests, furthermore, can be cast intp multiple-choice formats. Scaling of 
these techniques probably needs additional ati:ention, however. ^ ' i, \ 

Writing samples scored holistically can also be employed, although this^ 
procedure requires somewhat more expensive scoring by teams of readers. 
Identifying ar-d assembling groups of highly competent readers, for certain 
languages could be difficult, but major languages such as Spanish or French 
could be handled readily in this manner. The advantages of writing samples 
centers around their face validity, is that the examinee must produce an 
essay on one or more assignee} topics de'^gned ^o permit 'scopa^ of expression. 
Fairly reliable scoring techniques have been developed for English writing 
samples, and the same could be done for other writt.en languages. \ 

The diviision of instructional duties/!^ LI and L2 by teacher and aide 
has been of concern to bilingual educators for some time. Many of them are 
basically concerned that the more prestigious^person— the teacher--win con- 
duct instruction in English and relegate the use of the non-Engl ish. 1 anguage 
to the aide, thereby influencing language attitudes ;i;n^n undesirable 
manner (Bernal / Note 10). But\theVe are additional Sssues which have to do , 
with a program modeVs implementation in a classroom and, very importantly, 
with time-on-task, which seems to be particularly important for* compensatory 
educational programs (Davidson & Hoi ley , 1979) . ' / 

Thus instruction given in LI and'L2 involves total time, percent of 
instruction in each language, and- the content areas affected by these 
instructional modes'T^ Understanding' who teaches^hat in which language and 
for how long is potentially important: Tor teasing oUt differences in program 
impacts (Saville-Troike, 1978). 
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Instructional logs .Qr<|uestionnai res could be designed by the model 
developers to keep track of these figures. But occasional- spot checks would 
be necessary to validate these procadures, and.it is hcped that the evalua- 
tion of the New FT incorporates plans to conduct^tensive on-site 
.observations. 

Which brings us to the next topic, formal and informal language inter- 
act'ions between aide and teacher, teacher or aide and students . CI ass room 

observation procedures such as those inspired by Flanders (1961) could be 
designed for use by bilingual observers. A short term developmental effor.t 
is foreseen in this area to test the observational system's usability and 
reliability, keeping in mind that similar procedures have required careful 

.observer training, spot checking (Reid & Deflaster, 1972), and recal ibration 
(Reid, 1970). Adding ^a bilingual dimension to such systems nfay require 
compromising the scope of the interactions- to be observed or the use of an 
additional observation schedule which focuses on other interactive processes. 
It remains to be seen whether an' effective bilingual interaction form may - 
be used to supplement • an observation technique already well developed. 

' Parental participation data can probably be supplied accurately ty 
program personnel by keeping good records of meetings and other types of 
contacts. During the recent conference. on the Longitudinal Evaluation of 
Bilingual Programs (see Bernal , 1980b; Contreras, 1P80), the positive and 
negative effi^cts of different kinds and levels of parental participation be- 
come evident. The Hew FT program should monitor these effects carefully. 

Many aspects of instructional management could be emphasized in tHfe 
N^w FT. Tv/o general concerns arising from practices in the field (Bernal,* 
Note 10) have been selected for inclusion, since data on these elements ^ 

. should tell us much ^about Mew FT programs' approaches to teachipg language 

minority children, especially LEP children. 

22 f cb- 
-y 24 



The first concern bas to do with classifying a program as "pull-out*' 
or integral. The chief cFiaracterjstic of the pull-out approach is that 
non-English instruct^ion 'is provided only to LEP students and conducted by a 
resource teacher who works with them for a limited period during the day. 

r 

It is not unusual in such a program for the "bilingual" teacher to service ' 
several organized classes by working with small groups in a Itjarning center 
in the "home room," or to conduct a number of special classes made up of 
students "pulled out"-o15 their regular classrooms for bilingual instruction 
or tutoring in English. 

. %\ integral program provides bi_lingual instruction to children in a 
regular classroom setting by the regular classroom teaching staff. Academic 
content is to some extent^ taught in both languages, and often non-LEP chil- 
dren participate in these activities as well. 

Obviously, some classrooms may be "mixed," as in the case.v/here suffi- 
cient instructional resources exist for one language minority group but not 
for aiiOther. 

The second concern has to do with timing the introduction of English to 
the LEP child. The delayed introduction of English for instruction in the 
content areas requires that such instruction be essentially monolingual in 
the non-English language and that the study of English assume the sta^s of 
a subject in a broader curriculum. As competencies in English are acquired 
by the children, the academic areas assume a more bilingual orientation. 
Children in such programs ^re often able to read anrf'write in their first 
language before they are- introduced to these skills in English. 

The simultaneous introductions of English begins content instruction in 
both languages from the first day. Severely LEP students may be provided 
some additional help in the native language, but by and large the atmosphere 
in decidedly bilingual and one language helps to support the other. English 
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reading and writing instruction are introduced as quickly as possible. ^ > 

An immersion approach is difficult to characterize accurately. It is 
not an English-as-a-second-language (ESL) technique, and .t is^ot the tradi- 
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tional "sink-or-sv/im" system which LEP stucfents have so long endured. It is,, 
instead, a carefully seqi.'iinced system of content instructioji which helps chil- 
dren intuit the language as classes progress (Cohen,. 1975; Lambert & Tucker, 
1972; Bernal , Note IC). Learning aids, highly animated teaching, and sensi- 
tive adjustments of the English demand characteristic^ of the classes educe 
the desired English skills^, ostensibly without jeopordizing the normal develop 
ment of the mother tongue. Ncn-English instruction is later introduced, as a 
subjecti V\ ' • 

The degree of ethnic/1 inquistic integration will require the use of 
detailed demographic instruments down to the classroom level. Accurate lan^ 
guage proficiency categories will have to be devised, since current instru- 
ments do not deal effectively with all bilingual? and since other categorical 
systems, e.g., the Lau categories (Hal, 1978), would seem to be better 
adapted to legal and administrative classification than for scientific 
inquiry. 

It is difficult to say. which categories linguists and educators will 
agree are of interest to a Mew FT evaluation study. This writer believes^ . 
that tv/o types of categorical systems should be used. The first is based 
on the .LEP-EP distinction. EP students can be divided into dominant English- 
speaking minority and language minority popuTations, and the latter can be 
subdivided into once-LEP (reclassified), and n4ver-LEP students. This scheme - 
would account for the presence of Anglo, other English-speaking minority 
populations (principal ly Black students) , and English monolingual language 
minority background students. It v/ould also identify those languag^minori ty 



children v/ho are currently LEP, chose who were once LEP and are now reclassi- 
fied as EP (and hence important to follow up), and those who were functionally 
b'rlingual when they first entered the program. The presence or absence of EP 
language minority students will indicate whether the program is being imple- 
mented in a transitiorfal mode, i.e., whether it exits students viho become 
EP and does not directly service^those language minority students who come 
to school already competent in the target language. 

The second type of language-based categorical system advocated'^'here is 
based upon a three-dimensional matrix of functional English and non--English 
language categories and communication competence. A child's placement in 
this system v/ould depend not only upon her/Hi^ relative performance in each 
'Of two languages, but also upon their ability to cope with a variety of Ian- 
guage tasks. Two-dimensional categorizations (based on English and non- ^ 
English languages), it is recognized, alre?idy exist in the literature on 
b-'lingual proficiency assessment and in certain program regulations.. What 
is envisioned here, howev'er, is a system which is capable of better diagnostic 
prescriptive applications (particularly for students who score at the lower 
ranges of both English and non-English scales), is not misled by the spont^- 
neous (and, one might add, highly adaptive) codeswi tching behavior exhibited- 
by some bilingual childir*en, incorporates current language analysis theory, 
and' measures that aspect of language development (CALP) which predicts 
readiness to engage in second language instruction. This will be discussed 
further in the next section. - 

• .Both of these systems should be seen as dynamic, rather than static. 
Categorical membership and changes in categorical membership can be seen, 
respectively, as important covariables for the study of program-by-student 
interactions, or as criteria for program effectiveness. 
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student Variables 

Given the state of the art in language pv^oficiency assessment, good tests 
usi'ng a meaningful metric must be developed through a coordinated, multidisci- 
plinary R&D effort. The experience gained during the development of CIRCO 
(Bernal, 1977; Bernal , Note 5; Hardy, Note 6) suggests to this writer that 
an English proficiency test can be constructed to accurately measure the status 
of several language minority populations, so long as great care is taken ^ 
reduce disabling test anxiety (Sarason, 1961) and to prepare them for the test- 
ing experience. The content of such an instrument, furthermore, should be 
established on native English speakers of the same age, so that no items be 
included for language minority students that* English speaking members '^^O^the 
dominant ethnic group cannot themselves pass. Basing performance on native 
English speakers, in fact, could be one way of establishing a meaningftr/ 
metric for English proficiency. 

Lest we start thinking'of ^his development ,^fort only in traditional \ ^ 
terms, let me hasten to say that linguists have some innovative ideas for 



judging the level of language development, including some incisive techni- 
ques to analyze mistakes and the child's differential use of both languages. 
The principal shortcoming of these procedures, by psychometric standards, is 
their inefficiency. This is why a multidisciplinary effort .leems particularly 
appropriate at this juncture (Bernal, Notes 1 & 11). 

The contemplated test should measure LEP in the more comprehensive sense 
previously explicated. This means essentially that students in the second or 
third grades must be teg^ed in English reading and writing in addition to 
oral language proficiency. The determination of content should pose little 
problem since the more popular commercial achievement tests would seem to 
have sampled these curricular domains quite well. Indeed the SWRL* Student 



*Southwest Regional Laboratory for Educational Research. and Development. 
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Placement System (SWRL, 1980), intended to assist in the assessment and place- 
ment of language minority students, appears ta have largely duplicated the * 
efforts of commercial testing companies (Potter, Note 12) at the early 
elementary grade levels. 

The innovations required for a test of LEP, in this v/riter's opinion^ 
pivot around techniques for (1) screening children for eligibility for the 
English test, an^ (2) accommodating their diverse expectancies and test- 
taking behaviors. The screening procedure envisioned would be a brief, 
painless, and valid V'/ay of categorizing LEP children at the lower ranges, 
children who should not bo expo^d to a longer, frustrating examination in 
a language they barely understand. CIRCO has shown that a brief test in 
Spanish can be used to select and\)perational ly a group of students 

for whom its. Spanish-based subtests are appropriate (Bernal , 1977). TfTere 
is no reason to believe that-a similar process could not be used in English 



assessment of LEP~or in the administration of English-based general achieve- 
ment test batteries, for that matter, | 

Other writers (see Bernal, 1977) have uied techniques for reducing 
untoward test anxiety ," enhancing motivation, and* familiarizing students with 
those demand characteristics of the test v/hich are not central to the meas- 
urement objective but which if misunderstood could cause students to receive., 
lov/er marks than they would othervn'se achieve, i.e. would introduce 
systematic error into their measurement. Such techniques, as argued else- 
where in tftic paper,, have not received sufficient attention from psychomet- 
ric'ians, yet are pivotal to testing larfiguage minority students and minority 
populations in general (Bernal, 19750. 

The other half of' the language as sessment^lrictu re is .the measurement 
of non-English proficiency, and parti cul arly of co^iti ve/academic langua<?e 
proficiency (CALP). Now CALP as a construct is at the cutting, edge of 
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theories of bi 1 ingual fsm> so it may be difficult to operational ize. We do 
knov/ some things about it, hoviever, including that it seems to' be measured 
best by discrete-point (i.e., decontextual ized) items of higher-order cognt- 
tive processes mediated by the native language. Verbal learning psychologists 
need to examine CALP along with psycholinguists to see how similar it seems 
to be to such cognitive mechanisms as verbal mediation. If CALP turns out to 
be '^slosely related to factors which are psychometri cal ly more familiar, then 
instrument design can move ahead relatively quickly, although, of course, it 
may have to be cast in several languages* 

This requirement for producing diverse tests of non-English language 
proficiency poses a potential financial issue for the New FT. Designing 
and developing different tests in a systematic way for Spanish, French, 
Navajo, Chinese, Vietnamese, and other languaye minority groups would be 
an expensive proposition. Consequently a reconmendation is in order. The 
New FT should commission the development of (1) a comprehensive. and broadly 
comprehensible test of LEP, (2) one or two tests of non-English language 
proficiency and achievement according to anticipated need, and (3) a com- 
patible general technique for testing the non-English proficiency and 
achievement of other participating language minority groups. Under. the 
second part of this recommendation, proficiency tests in Spanish and perhaps 
one other language would be developed on a priority basis. The general 
technique espoused in the third part of this recommendation might' be developed 
around guidelines for criterion-referenced measurement of the, relevant lan- 
guage domains . ^ 

Were the New FT to decide to measure only LEP status to the exclusion 
of non-English language proficiency, an important diagnostic and classifica-. 
tory base would be lost. CALP is too exciting, too potentially useful a 
construct to overlook. Were the New FT to restrict the'number of different . 
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language minority groups participating in^^he New FT program, it would have 
to either restrict the types of program sites to those which could introduce 
no "siirprises" in the evaluation design or find suitable techniques for con- 
verting the highly prob.Me statistical "noise" into orchestratable patterns. 
The only other alternative is to be willing to sacrifice important empirical 
data to the gods of finance^ 

Achievement testing in the non-English language is another matter. 
There are man)^ reasons for promoting subject matter achievement testing in 
LI for LEP students, but none, I believe, should pQt the burden of support- 
ing their development on the New FT. In this area of measurement extant 
English-based achievement tests can be made ter suffice so long as adequate 
safegusirds can be developed to protect LEP children from test misuse. 

These safeguards could include the use of the comprehensive test of 
English proficiency for screening. Assuming that such a test would provide 
valid assessments, there v;ould be little point in subjecting profoundly LEP 
children to a four hour battery in English. But there may be some need to 
investigate several related issues further: (1) what should be the cutoff 
point on the comprehensive English proficiency test for excusing students 
from the achievement test; (2) aij|e*^there any parts ^ standard achievement 
batteries which can be administered validly to LEP children with* or without 
minor adaptation (i.e., adaptation^^^fii ch do not jeopardize the comparability 
of scores)? Similarly, children in the New FT--whatever their ethnicity-- 
shbyjd "receive practice in test-taking skills as part of any model's curric- 
ulum, thereby enhancing the- children's competence to cope with such 
instruments (Saville-Troike, 1978). . - . v 

The "old" FT used other cognitive measures in addition to achievement. 
■ ff similar- plans are being made as of this writing, this author would like 
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to suggest the use of Piagetian tests as measures of cognitive maturity 
which many language minority populations can take without in their own 
languages significant bias (De Avila & Havassy, 1974). These tests also 
have diagnostic possibilities for special programs (De Avila & Havassy, 
1975), including the identification of gifted children (Bernal , 1974). 

Similarly, a judicious sample of clifissrooms night be administered a' 
test of cognitive style. The literature on the relevance of cognitive style 
to insj^ryctional effectiveness and teacher-student relations .establishes the 
importance of^ this variable for education (Witkin, Note 13) and cross- 
cultural research (Witkin, 1967). Then, too, instructional techniques may 
have differential effects on language minority children with different cog- 
nitive styles (Holtzman, Goldsmith, & Barrera, 1979). 

In the affective doipain, attitude measurements should include the 
esteem which language rpinbrity and. majority populations have for each other 
and the attitudes of the language minority group to the use of their lan- 
guage and toward their own ethnic group. Monitoring these attitudes, 
particularly as language minority chiWren grow in their English proficiency, 
should be one way ot estimating some of the programmatic effects of concern 
to language minority populations and roundijig out the evaluation of the New 
FT models. 

In this section we have discussed the need for a major R&D effort to 
develop an adequate, mul tixul tural ly appropriate test of English language ■ " 
proficiency, at least one test of proficiency in a non-English language, 
and a complementary general technique for testing the native language profi- 
ciency of numerically small ethnic groups 'for which quality, standardized 
"assessments are ho-t likely to become available. So far as subject matter 
achievement testing of language minority populations is concerned, the — 



better extant standardized instrument^ (both norm-referenced and criterion- 

refc -enced) can be made to suffice so long as LEP children are not placed 

i 

at. risk. Piagetian measures of intel rectual development and tests of 
cognitive style round out the cognitive. domain. In the affective domain 
interethnic. and language attitudes should be included in the New FT's plans 
to evaluate programmatic effects. 
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Cautions In Evaluatfo n 

The discussions aboift instrumentation, measurerrent and bilingual 
education in this paper have implications for evaluation in the Nev) FT.. 
This paper is not devoted to evaluation, but since the evaluation of bilin- 
gual programs is fraught v/ith difficulties, as evidenced by the AIR study 
■Danoff, 1978) and its aftermath (see, for example, O'Malley, Note 14), a 
few problem areas v/ill be identified. 

One has already been mentioned, the preponderance of chance scores , ^ 
O'Halley (Note 14) noted in his review of the AIR data that^ven when 
averages seemed to favor the children in the bilingual programs, they were 
rarely higher than the 20th percentile on national norms. It is clear, 
then, that data collected in compensatory programs are often highly skewed 
positively, and that significant proportions of language minority students 
score at or below chance on multiple-choice tests. If some of the sugges- 
tions for protecting LEP students ai^ finding alternative achievement meas- 
ures made in this paper are followed^ more useable data should result. 

Another cauVj^on v-rtiich negds to be observed is the imposition of 
unreasonable standards^rTperformance on LEP children , A second language 
is not acquired like skills in an academic subject. In the past language 
minority students are seen as making considerable improvements in English^ 
and in tested academic achievement only in the later elementary grades 
(USGAO, 1976; O'Malley, Note 14). Cummins' (1980) work suggests- ttiat CALP 
takes time to develop, and that if it doesn't develop in the native language 
it may never develop at all. Since FT has limited itself in the past to 
the early elementary years, it may not be possible to show massive grov/th in 
English language prof iciehcy' and academic achievement without a followup study 
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Placing LEP children in non-bilingual comparison classes is ethnically 
questionable and often unfeasible (Bissell, 1980). In some states, further- 
more, it Is illegal. This writer's experience indicates that in public 
school settings the exigencies of teaching the children most in need make a 
shambles out of randomization efforts. Designs v/hich take advantage of 
intersite, interclassroom variability in student characteristics and instruc- 
tional approaches should be used, since these nay prove more useful than 
using models at in;iependent variables (House, et al . , 1978; Rodriguez-Brown, 
1978): ^ 

Lack of process and contextual data restricts the 'interpretation of 
efforts. Cbtaining data on program characteristics is crucial (Rodriguez- 
Brown, 1978), and such variables have been recommendecT herein. Ethnographic 
monitoring (Hymes, 1979) should also be considered in a sample of sites, 
since this may gain data from another perspective not only on.-processes and 
contexts but also on effects, especially on unanticipated outcomes'. 

High student attrition can be expected to occur in the New^ among 

language minority students generally, if the Title VII experience is any 

indicator (see Ligon, 1980). • / 

...for many schools large attrition rates indicate... unsystematic 
"exits"... due to the exigency of serving the students most in need 
with limited resources or the recalcitrance of some local school admin- 
istrators who v/ould subotage the program by convincing the parents of 
moderately well achieving students to sign waivers becausre their chil- 
dren presumably *'don*t..I need the program anymore." The cumulative 
effect of these practices is^ probabfy to depress the average- scores of 
.the remaining project students (Bernal , 1980a). 

Special cautions and agreements between the New FT and participating schools 

are in order, else student .cohorts may be capriciously dismembered. " Large 

numbers- of students and classrooms should be obtained v;henever possible. 
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Conclusions 



The inclusion of language minority students in the New FT poses great 
challenges and opportunities' for curricular, psychonetric, and evaluative 
innovation* This paper', in delineating variables of interest to the New FT 
and the means of measuring them^ has hopefully disabused us of any facile 
' notion that merely including these students and^etting aside an instruc- 
tional model or two for t{iem will suffice; The New FT will doubtlessly 
have to accommodate la^iguage minority students in v/ays never envisioned i-n 
the 1960s. These challenges should be met creatively, not just expeditiously, 
in the tradition of Follow Through^ which is to brinq the best of educational 
theory into the realm of educational practice. 
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